Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[deploy_website] enhance(stitch) canonical merged type/field definitions #2417

Merged
merged 17 commits into from
Jan 17, 2021

Conversation

gmac
Copy link
Contributor

@gmac gmac commented Dec 29, 2020

Follows up the discussion of better support for merging type and field descriptions in gmac/schema-stitching-handbook#12. Interestingly, Federation actually has an approach to this problem given that their use of the "extends" keyword establishes a base for each type to prioritize.

This proposal more or less takes the inverse approach of the extends keyword... rather than extends identifying descriptions that we don't prioritize, let's simply flag the types we do want to reference as being "canonical".

Merge Candidates changes

  • Adds a canonical setting to MergedTypeConfig and the informal "MergedFieldConfig" type.
  • Makes the default type description merger select the canonical type's definition, or final definition.
  • Makes the default field merger select the canonical field's definition, or canonical type's definition, or final definition.
  • Adds EnumValue merger that selects a definition from the canonical type, or the final definition.

None of these changes are breaking from the current behavior. 🙌

Stitching Directives changes

Adds a @canonical directive that can be applied to all merged schema types and fields.

"This is the canonical type description!"
type Product @key(selectionSet: "{ upc }") @canonical {
  # ...
}

type Review {
  # ...
  "This is the canonical field description!"
  body: String @canonical
}

Under the hood, these directives just write canonical attributes into merged type config for types and their fields. For simplicity, Enum values cannot override the canonical enum definition, which seems... fine (wanting to document the official values in two places without a backing implementation concern is an extreme edge case).

@yaacovCR – thoughts on this? If you're on board with the feature, then I'll proceed with adding the corresponding docs. If not, we can table it at this stage. All told, I think this is a fairly elegant library-level solution to the descriptions problem.

TODO:

  • If this PR is a new feature, reference an issue where a consensus about the design was reached (not necessary for small changes)
  • Make sure all of the significant new logic is covered by tests
  • Rebase your changes on master so that they can be merged easily
  • Make sure all tests and linter rules pass

@changeset-bot
Copy link

changeset-bot bot commented Dec 29, 2020

🦋 Changeset detected

Latest commit: 1d0138e

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@gmac
Copy link
Contributor Author

gmac commented Dec 30, 2020

Also, still contemplating how to handle selecting non-empty descriptions (which is generally desirable) in a logical and non-breaking manner...

@yaacovCR
Copy link
Collaborator

Also, what if you want to declare a subschema as the "describer" of a field, but you want a different subschema to determine the nullability? Should we have multiple fine-gained directives?

@yaacovCR
Copy link
Collaborator

In terms of avoiding empty descriptions, we could just change the default merge, I guess, not the worst breaking change.....

@gmac
Copy link
Contributor Author

gmac commented Dec 30, 2020

Hmm. Why not just go all out ownership and allow a directive that allows a subschema to declare itself as the "owner" of a type/field, which would matter for just descriptions, field merges, etc. What exactly is gained by allowing graded priorities?

Good question... mainly a matter of the ye-olde-arms-race of a type already marked as important and needing to be MORE important (this is why I hate CSS). I like the simplicity of an @owner boolean, and that terminology is very appropriate. Being able to apply that at both type and field levels seems like it covers the majority of cases.

Also, what if you want to declare a subschema as the "describer" of a field, but you want a different subschema to determine the nullability? Should we have multiple fine-gained directives?

I'd say no... that gets more granular than the practical usecase (there's always custom merged type handlers for crazy specific things). A field should be owed by one service. Actually, going down this road, the directive could be @canonical, meaning this is the canonical definition that gets built into the stitched schema. There's still the possibility to have multiple canonical definitions by accident, but that could just follow the last-one-wins pattern, and perhaps have a mergedTypeConfig option to error when multiple canonicals of the same element are encountered.

This is sounding a lot cleaner and more intuitive. Good talk.

@gmac
Copy link
Contributor Author

gmac commented Dec 30, 2020

See 678efaa... this is feeling a lot simpler and more intuitive. Thoughts?

@yaacovCR
Copy link
Collaborator

Would users also expect "canonical" fields to always be delegated to the owning service?

@gmac
Copy link
Contributor Author

gmac commented Dec 30, 2020

Would users also expect "canonical" fields to always be delegated to the owning service?

I think that’s just a matter of explanation. This would strictly be a mechanism for assembling the gateway schema definition, and would have absolutely no impact on the runtime behavior. I think we could underscore that distinction in docs and make users understand the difference. The definition problem is still a difficult challenge without an easy solution right now. Also, it’s not like you need to manually specify everything as canonical. It’s really just for merged types (which naturally tend to have a primary definition in one service by nature of database design), and the option for a field override, which I’d expect to be pretty rare (we could do our entire gateway schema with just noting the base type definitions I suspect; I can’t picture a case in our stack where a field duplicated across services is more specific than one of our base types).

@gmac
Copy link
Contributor Author

gmac commented Dec 30, 2020

Anywho... is this feeling too big or imposing for you? Not trying to push an agenda!

@yaacovCR
Copy link
Collaborator

Just trying to feel out implications. Let's say we later wanted to add another directive then that establishes a field as only gettable from the given subschema. Kind of a provides negator/antonym. What would that be called, how would that work?

Would we be leaving enough space for that if canonical is added? Should that be a different directive? An argument to canonical?

@gmac gmac changed the title enhance(stitch) merged type/field priorities enhance(stitch) canonical merged type/field definitions Dec 31, 2020
@gmac
Copy link
Contributor Author

gmac commented Dec 31, 2020

Okay, @yaacovCR – I'm about done with my envisioned feature set here. Open to hammering out the nitty-gritty...

To your question about SDL, I think that can be broken down fairly simply by considering what each thing is fundamentally touching...

  • @canonical, as it is proposed, is focused on the gateway schema definition––not the runtime. It's unique in that it's really a compiler directive (if you think about building the merged schema as a form of compilation, which I do). The focus of this directive is on assembling type definitions in the shape that you want them, which is currently a bit under-represented as a concern in stitching's automation.

  • Then, this @anti-provides directive that you're proposing... (which is a cool idea, although I'm not convinced on the necessity of: why not just solve that problem with a transform that removes the ignored fields from the subschema? That seems more like "the stitching way"). Either way, that is very much a runtime concern of the query planner, therefore I don't see it having any real overlap with a compiler concern like canonical.

I'm with you on caution about lots of directives though... This @canonical proposal does make me sad in that it adds to the surface area, especially given that we're -1 on Federation SDL right now. That said, I think of Federation as really having five directives given that extends acts like one (and this "canonical" idea is very much its antonym).

@yaacovCR
Copy link
Collaborator

We only have one directive @merge with new #2411

Unless you need complex keys...

The question of to use a transform versus a directive is about to get actually kind of complicated as there are a lot of transforms like filtering that might make sense as directives....

@gmac
Copy link
Contributor Author

gmac commented Dec 31, 2020

First pass at documentation added. I've got a few other places to work in detailed mentions, but for starters – how do you feel about how this is shaping up as a feature?

@yaacovCR
Copy link
Collaborator

It's growing on me!

It looks like you have canonical fields overriding canonical types...

Is that obviously desired behavior? In general, if two different subschemas declare a field to be canonical, I think you error instead of warning and picking one... Is that desirable?

Just poking at this in my head, I think the choices you have made are pretty reasonable, just trying to figure out if there are any other options that make sense and maybe if it should be configurable....

@gmac
Copy link
Contributor Author

gmac commented Dec 31, 2020 via email

@gmac
Copy link
Contributor Author

gmac commented Jan 2, 2021

Okay @yaacovCR – calling this ready for final review. I left the validation errors in place for now because they can always be downgraded to warnings safely (while the reverse isn't true). Also did some additional housekeeping around the doc site. Overall, I'm pretty happy with the feel of the feature (and would probably try to refactor my stack to use it!)

@yaacovCR
Copy link
Collaborator

yaacovCR commented Jan 7, 2021

No I'm thinking somewhat about the name only. @canonical is really ok...I am just wondering if we should consider some alternatives, just to say we did...

A general thought: we have some directives that are nouns (@key), some that are adjectives (@computed), some that are verbs (@merge). That is probably unfortunate, although I guess not a huge deal. One could argue that the next directive should have to be an adverb just to me consistently inconsistent?

Anywa, ideas:
@canonize -- easier to say? is a verb more appropriate to describe what we want here? doesn't matter?
@export -- i.e. export to wrapping schema as a the canonical type -- I like that it signifies a bit for what purpose the field is canonical, but don't like the fact that all fields are really exported...
@imprint -- kind of like this, imprint this canonical type/field attributes on the wrapping schema, describes a bit more the effect of the canonical idea, avoids the loaded term export?

Thoughts?

@gmac
Copy link
Contributor Author

gmac commented Jan 7, 2021

To the suggestions –

  1. I prefer canonical to canonize for reasons I'll go into below...
  2. export is probably my least favorite here because it's such a loaded term in software development, and this feels a bit incongruent to the normal concept.
  3. imprint is... fine. I don't find it intuitive by name alone, but I'd read the docs and understand what it does.

So to expand on my thinking for canonical – this is already a fairly established concept in networking given the notion of canonical URLs, ie:

<link rel="canonical" href="https://www.vox.com/culture/22218583/trump-movie-hollywood-capitol-insurrection-biden-hawley" />

It's sort of established nomenclature that the canonical definition is the thing observed above others, although no one version of the thing is technically wrong. That's pretty much a spot-on match to what this feature achieves, and that function is fairly intuitive to infer from the name alone.

@gmac
Copy link
Contributor Author

gmac commented Jan 8, 2021

Poking at this a bit more, I’ve got a few more adjustments coming. Wrote up a doc on our own intended use cases the other day and that helped a lot to solidify what this feature needs to do. Will add some excerpts of that doc for context. To summarize further needs:

  • Description merger needs to be deprecated in v8 in favor of a new Type merger (returns a type candidate rather than a description... both can be supported for the time being).
  • Then, after selecting a canonical candidate, that candidate gets sorted to the end of the AST mergers. This assures that the canonical definition wins while incrementally merging elements, so we get canonical directive values and such.

@@ -18,7 +18,7 @@ export function mergeEnum(
: 'EnumTypeExtension',
loc: e1.loc,
directives: mergeDirectives(e1.directives, e2.directives, config),
values: mergeEnumValues(e1.values, e2.values, config),
values: mergeEnumValues(e2.values, e1.values, config),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that the enum type and value mergers are implemented with reversed argument order from each other. While the enum type merger assigns A -> B, the enum value merger assigns B -> A. That means you'll end up with type settings from one candidate and value settings from the other... :sigh:

This seems like the least invasive correction to make these align.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me for now, might make sense to open up separate issue on that for consistency, defer to @ardatan

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmac
Copy link
Contributor Author

gmac commented Jan 9, 2021

Okay, @yaacovCR – I've expanded this to be a lot more robust. It felt pretty squishy that this feature existed simply to service element descriptions. I've retooled some things so that it now prioritizes element definitions... those will include:

  • Element descriptions
  • Element directives (useful when stitching schemas that may include your own config settings, such as marking field visibility or assigning fields a query complexity score). In these cases, competing values should be read from the canonical definition.
  • Field deprecation reasons (technically rolls in with directives)
  • Field nullability: the canonical definition will have the final word.

Controlling full definitions makes this a considerably more useful feature for a wide range of needs. Thinking more on the name, @primary or @preferred are also terms that would make sense here, although I still like the parity of "canonical" with existing networking nomenclature.

@gmac
Copy link
Contributor Author

gmac commented Jan 9, 2021

AAAAAAAARG. Rebasing totally blew out this diff. Apparently I still don't know how to rebase properly. Might just open a new branch that fixes the diff. Anyway... lots of unrelated junk in here now. Should only be about 15 files.

@yaacovCR
Copy link
Collaborator

yaacovCR commented Jan 9, 2021

git reflog to roll back rebase?

@yaacovCR
Copy link
Collaborator

yaacovCR commented Jan 9, 2021

@gmac
Copy link
Contributor Author

gmac commented Jan 10, 2021

Once again in pretty good shape here and ready for consideration!

@gmac gmac changed the title enhance(stitch) canonical merged type/field definitions [deploy_website] enhance(stitch) canonical merged type/field definitions Jan 11, 2021
@gmac
Copy link
Contributor Author

gmac commented Jan 14, 2021

@yaacovCR – I've thought of another major usecase that would be really nice for this feature to fulfill. I'd propose renaming the new setting/directive to @primary, and then a primary definition would also provide the delegation target of root fields in addition to its primary schema definition. For example:

Users service:

type User {
  id: ID!
  username: String!
  displayName: String
}

type Query {
  users(ids: [ID!]): [User]! @merge(keyField: "id") @primary
}

Reviews service:

type Review {
  id: ID!
  body: String
  author: User
}

type User {
  id: ID!
  reviews: [Review]!
}

type Query {
  reviews(ids: [ID!]): [Review]! @merge(keyField: "id") @primary
  users(ids: [ID!]): [User]! @merge(keyField: "id")
}

So basically in the above, each service is now allowed to have its own users query, and @primary specifies which one the gateway schema will proxy directly. This would cleanup the need for _users-style hacks that are simply done to avoid hijacking the gateway namespace from the primary type service.

Thoughts? Would this pose any dire implications to the current implementation? If it seems reasonable, where in code is this selection of root fields handled? I get quickly lost within the delegation package.

@yaacovCR
Copy link
Collaborator

root field names that conflict are a special case of object fields that conflict...

whereas for object field names that conflict, the query planner can use whichever schema appears more optimal, for root fields, the query planner usually has to select a given schema (unless the root fields are nested), so this directive makes sense.

on the other hand, the root fields could be nested -- would you want to let the query planner use whatever subschema seems most appropriate, or would you still insist on the primary?

I would also wonder about using a separate directive. I think users are now going to wonder why they can't use the primary directive to select a subservice for a given field (i.e. a reverse form of provides, in which by default we can get fields from all subschemas, but allow users to choose a particular one)

or maybe we should?

Just thinking out loud about the connection between canonical field definitions and primary subschemas for accessing those fields...

@gmac
Copy link
Contributor Author

gmac commented Jan 14, 2021

Hmm, that’s interesting. I can see the case for keeping schema and implementation divorced for the reasons you cite. The case for a secondary directive makes some sense. Something like an “entrypoint” or “root” setting. I’ve never used nested root types before, so I’m not super familiar with how the optimal entry process would work. All told, seems like it should favor the optimal whenever possible.

@yaacovCR
Copy link
Collaborator

yaacovCR commented Jan 15, 2021 via email

@gmac
Copy link
Contributor Author

gmac commented Jan 15, 2021

I get what you’re saying. Let’s scratch this as an addendum to the canonical feature and leave this pr as is.

@gmac
Copy link
Contributor Author

gmac commented Jan 15, 2021

Oh, I see. The query planner dynamically selects a root field to go to, doesn’t it? So in the above schema example, you can safely have both schemas define a “users” query, and it will automatically delegate to the better option based on other fields selected...? Do I have that right? If so, mind is blown YET AGAIN. I’ve been thinking of root fields as statically mapped entry points, where it will always visit the final field definition as the origin query.

@gmac
Copy link
Contributor Author

gmac commented Jan 15, 2021

If I do have that right, then this whole topic is moot. No need to mark a preferred entrypoint if the planner is smart enough to pick the best one for the query.

@yaacovCR
Copy link
Collaborator

The non nested root fields are statically defined, but the nested should be treated the same as any other field (which is why btw the resolver for root fields falls back to default merged resolver if detects has an external parent)

@gmac
Copy link
Contributor Author

gmac commented Jan 15, 2021 via email

@yaacovCR
Copy link
Collaborator

Correct, I'm just wondering if this special case deserves its own directive. Even though it is static, still seems to me that it should be separate from the other canonical issues but interested in your thoughts

@gmac
Copy link
Contributor Author

gmac commented Jan 15, 2021 via email

@yaacovCR
Copy link
Collaborator

Makes sense to me! Docs should highlight.

@gmac
Copy link
Contributor Author

gmac commented Jan 16, 2021

Hu, looks like the way this sorts everything already lined it all up so that the canonical root definition is the one that gets used. I just tinkered around with permutations in this test, and they all work as expected! 1a1b332. Does it make sense that this works? I assume candidate order must be preserved within the delegation mappings.

The one exception is that resolvers built into the stitched schema will override the primary delegations, which seems... fine.

So I guess, rename all this to @primary and call it a day?

@yaacovCR
Copy link
Collaborator

Yes, last root field wins, so changing the order would probably be enough.

Does @primary imply that there could be some "other" scenario where a secondary is used? I think @canonical still makes sense.

@gmac
Copy link
Contributor Author

gmac commented Jan 16, 2021 via email

@gmac
Copy link
Contributor Author

gmac commented Jan 17, 2021

Docs are updated! Once again, I think we're in good shape here.

@yaacovCR yaacovCR merged commit ef14668 into ardatan:master Jan 17, 2021
@gmac gmac deleted the gm-merge-priority branch January 30, 2021 03:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants